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1  Introduction 


Object  recognition  is  one  of  the  most  ubiquitous  problems  in  computer  vision,  arising  in  a  wide 
range  of  applications  and  tasks.  As  a  consequence,  considerable  effort  has  been  expended  in 
tackling  variations  of  the  problem,  with  a  particular  emphasis  recently  on  model-based  meth¬ 
ods.  In  the  standard  model-based  approach,  a  set  of  stored  geometric  models  are  compared 
against  geometric  features  that  have  been  extracted  from  an  image  of  a  scene  (cf.  [4,  11]). 
Comparing  a  model  with  a  set  of  image  features  generally  involves  finding  a  valid  correspon¬ 
dence  between  a  subset  of  the  model  features  and  a  subset  of  the  image  features.  For  a 
correspondence  to  be  valid,  it  is  usually  required  that  there  exist  some  transformation  of  a 
given  type  mapping  each  model  feature  (roughly)  onto  its  corresponding  image  feature.  This 
transformation  generally  specifies  the  pose  of  the  object  -  its  position  and  orientation  with 
respect  to  the  image  coordinate  system.  The  goal  thus  is  to  deduce  the  existence  of  a  legal 
transformation  from  model  to  image  and  to  measure  its  ‘quality’.  In  other  words,  the  goal  is 
to  determine  whether  there  is  an  instance  of  the  transformed  object  model  in  the  scene,  and 
the  extent  of  the  model  present  in  the  data. 

More  formally,  let  {Fiji  <  i  <  m}  be  a  set  of  model  features  measured  in  a  coordinate 
frame  M,  let  {/,jl  <  t  <  s}  be  a  set  of  sensory  features  measured  in  a  coordinate  frame  S, 
and  let  T  :  A4  — »  5  denote  a  legal  transformation  from  the  model  coordinate  frame  to  the 
sensor  coordinate  frame.  The  goal  is  to  identify  a  correspondence,  /  C  2’"^*,  that  pairs  model 
features  with  sensor  features.  Each  such  correspondence  /  specifies  some  transformation  T/ 
which  maps  each  model  feature  close  to  its  corresponding  image  feature.*  That  is 

I  =  {(m,,Sj)l/>(T/mi,s,)  <  c}. 


where  p  is  some  appropriate  measure  (e.g.  Euclidean  distance  in  the  c«ise  of  point  features,  or 
maximum  Euclidean  separation  in  the  case  of  line  features)  and  e  is  some  bound  on  uncertainty. 
In  general  the  quality  of  such  an  interpretation  is  measured  in  terms  of  the  number  of  pairs 
of  model  and  image  features,  or  the  cardinality  of  I,  |/|.  The  goal  of  recognition  is  generally 
either  to  find  the  best  interpretation,  maximizing  |/|,  or  all  interpretations  where  |/|  >  t  for 
some  threshold  t. 

Many  of  the  approaches  to  the  recognition  problem  can  be  distinguished  by  the  manner  in 
which  they  search  for  solutions.  One  class  of  methods  focuses  on  finding  the  correspondence 
/,  typically  by  searching  a  potentially  exponential  sized  space  of  pairings  of  model  and  data 

given  interpretation  I  will  in  fact  generally  define  a  range  of  ‘equivalent’  transformations  in  the  sense 
that  there  are  a  number  of  transformations  that  generate  the  same  set  /. 
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features  (e.g.  [5,  7,  10,  11]).  A  second  class  of  methods  focuses  on  finding  the  pose  T,  typically 
by  searching  a  potentially  infinite  resolution  space  of  possible  transformations,  (e.g.  (2,  8,  9, 18, 
19,  20,  21,  22]).  A  third  class  of  methods  is  a  hybrid  of  the  other  two,  in  that  correspondences 
of  a  small  number  of  features  are  used  to  explicitly  transform  a  model  into  image  coordinates 
(e.g.,  [1,  3,  6,  16,  17])  guiding  further  search  for  correspondences. 

We  are  primarily  interested  in  methods  in  the  second  and  third  classes,  because  they 
compute  transformations  from  model  coordinate  frame  to  image  coordinate  frame  using  a 
small  number  of  feature  pairs.  When  the  sensor  data  can  be  measured  exactly,  the  fact  that 
a  small  number  of  features  are  used  to  compute  a  pose  does  not  cause  problems.  For  real 
vision  systems,  however,  there  is  generally  uncertainty  in  measuring  the  locations  of  data 
features,  and  resulting  uncertainty  in  the  estimated  pose.  In  this  paper  we  develop  methods 
for  bounding  the  degree  of  uncertainty  in  a  three-dimensional  transformation  computed  from 
a  small  number  of  pairs  of  model  and  image  points.  The  specific  pose  estimation  method  that 
we  investigate  is  that  of  [16,  17],  however  the  results  are  very  similar  for  a  number  of  other 
methods  (e.g.,  [23,  24]).  This  pose  estimation  method  uses  the  correspondence  of  3  model 
and  image  features  to  compute  the  three-dimensional  position  and  orientation  of  a  model  with 
respect  to  an  image,  under  a  ‘weak  perspective’  imaging  model  (orthographic  projection  plus 
scaling). 

The  central  idea  of  most  pose- based  recognition  methods  (such  as  alignment,  geometric 
hashing,  generalized  Hough  transform)  is  to  use  a  small  number  of  corresponding  model  and 
image  features  to  estimate  the  pose  of  an  object  acting  under  some  kind  of  transformation. 
The  methods  then  differ  in  terms  of  how  the  computed  pose  is  used  to  identify  possible 
interpretations  of  the  model  in  the  image.  The  pose  clustering  and  pose  hashing  methods 
compute  all  (or  many)  possible  transformations,  and  then  search  the  transformation  space  for 
clusters  of  similar  transformations.  In  contrast,  the  alignment  approaches  explicitly  transform 
the  model  into  the  image  coordinate  frame.  The  effects  of  pose  errors  in  these  two  cases  will 
be  different,  and  we  analyze  the  two  cases  separately. 

Implementation  and  testing  of  pose-based  recognition  methods  has  been  reported  in  the 
literature,  with  good  results.  An  open  issue,  however,  is  the  sensitivity  of  such  methods  to  noise 
in  the  sensory  data.  This  includes  both  the  range  of  uncertainty  associated  with  a  computed 
transformation,  and  the  effect  of  this  uncertainty  on  the  range  of  possible  positions  for  other 
aligned  model  features.  The  answers  to  these  questions  can  be  used  to  address  other  issues,  such 
as  analyzing  the  probability  of  false  positive  responses,  as  well  as  using  this  analysis  to  build 
accurate  verification  algorithms.  In  addressing  these  issues,  we  first  derive  expressions  for  the 
degree  of  uncertainty  in  computing  the  pose,  given  bounds  on  the  degree  of  sensing  uncertainty. 
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Then  we  apply  these  results  to  analyze  pose  clustering  methods,  and  alignment  methods.  The 
big  difference  between  these  methods  is  that  the  former  methods  operate  explicitly  in  the 
space  of  possible  poses  (or  transformations),  whereas  the  latter  methods  operate  in  the  space 
of  image  measurements. 

Previous  Results 

Some  earlier  work  has  been  done  on  simpler  versions  of  these  questions,  such  as  recognition  that 
is  restricted  to  2D  objects  that  can  rotate  and  translate  in  the  image  plane  [12]  or  recognition 
that  is  restricted  to  2D  objects  that  can  be  arbitrarily  rotated  and  translated  in  3D,  then 
projected  into  the  image  plane  [14].  While  these  are  useful  for  analyzing  particular  cases 
of  recognition,  we  are  interested  in  extending  these  results  to  the  full  case  of  a  non-planar 
3D  object  undergoing  arbitrary  rigid  motion  (under  a  ‘weak  perspective’  imaging  model  of 
orthographic  projection  plus  scaling). 

2  Computing  3D  Pose  from  2D  data 

The  pose  estimation  technique  that  we  evaluate  in  detail  is  that  of  [16,  17],  though  similar  re¬ 
sults  hold  for  other  techniques  that  use  a  small  number  of  points  to  estimate  three-dimensional 
pose.  This  method  of  computing  the  pose  operates  as  follows:  We  are  given  three  image  points 
and  three  model  points,  each  measured  in  their  own  coordinate  system;  the  result  is  the  trans¬ 
lation,  scale  and  three-dimensional  rotation  that  position  the  three  model  points  in  space  such 
that  they  map  onto  the  image  points  under  orthographic  projection.  The  original  specification 
of  this  method  assumes  exact  measurement  of  the  image  points  is  possible.  In  contrast,  our 
development  here  assumes  that  each  image  measurement  is  only  known  to  within  some  uncer¬ 
tainty  disc  of  a  given  radius,  e.  We  speak  of  the  nominal  measured  image  points,  which  are  the 
centers  of  these  discs.  The  measured  points  can  be  used  to  compute  aii  exact  transformation, 
and  then  we  are  concerned  with  the  variations  in  this  transformation  as  the  locations  of  the 
image  points  vary  within  their  respective  discs. 

Let  one  of  the  measured  image  points  be  designated  the  origin  of  the  points,  represented  by 
the  vector  o,  measured  in  the  image  coordinate  system.  Let  the  relative  vectors  from  this  point 
to  the  other  two  points  be  m  and  n,  also  measured  in  image  coordinates.  Similarly,  let  0,M 
and  denote  the  three  model  vectors  corresponding  to  o,  m  and  n,  measured  in  a  coordinate 
system  centered  on  the  model.  For  convenience,  we  assume  that  the  model  coordinate  origin 
is  in  fact  at  O  (see  Figure  la).  We  also  assume  that  the  model  can  be  reoriented  so  that  M 
and  lie  in  a  plane  parallel  to  the  image  plane.  Note  that  we  use  the  notation  x  for  general 
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Figure  1:  Computing  the  pose.  Part  a;  A  pair  of  basis  vectors  in  the  image  have  been  selected,  as  well  as 
a  pair  of  model  basis  vectors.  The  origin  of  the  model  basis  is  assumed  to  project  to  the  origin  of  the  image 
coordinate  system.  Part  b:  After  the  translation,  the  origin  point  of  the  selected  model  basis  projects  to  the 
origin  of  the  selected  image  basis.  Part  c:  After  the  first  rotation,  the  first  model  axis  projects  along  the  first 
image  axis.  Part  d:  After  the  next  two  rotations,  both  model  axes  project  along  their  corresponding  image 
axis,  such  that  the  ratios  of  the  lengths  of  the  axes  are  the  same  in  the  projected  model  and  the  image. 


vectors,  and  x  for  unit  vectors.  Also  note  that  we  assume  that  the  optic  axis  is  along  z. 

Our  version  of  the  pose  estimation  algorithm  is  summarized  below.  The  method  described 
in  [16]  is  similar,  but  for  2D  objects.  The  method  in  [17],  for  3D  objects,  is  more  direct  and 
appears  to  be  numerically  more  stable.  The  method  used  here,  however,  more  readily  lends 
itself  to  error  analysis  of  the  type  desired  (and  the  two  methods  are  equivalent  except  for 
numerical  stability  issues).  In  particular,  the  method  used  here  allows  us  to  isolate  each  of  the 
six  transformation  parameters  into  individual  steps  of  the  computation. 

For  the  exact  transformation  specified  by  the  measured  sensory  data,  the  steps  are: 

1.  Translate  the  model  so  that  the  origins  align.  A  point  P  is  then  transformed  to  P'  by: 

P'  =  P+o-U^ 

where  IIj  denotes  projection  along  the  z  axis  (see  Figure  lb). 

2.  Rotate  the  model  by  an  angle  V’  about  the  axis  parallel  to  z  and  emanating  from  O'  so 
that  Iljii?  lies  along  m,  leading  to  the  transformation 


where  denotes  a  rotation  of  angle  0  about  the  unit  axis  z  (see  Figure  Ic). 

3.  Rotate  the  model  by  an  angle  6  about  the  new  Af",  leading  to  the  transformation 


plH  _  p  ^ 

r  -  . 


4.  Rotate  the  model  by  an  angle  4>  about  m-*- 1=  £  x  m  (Figure  Id),  leading  to 


Pettit  ^  n  p/// 

-  ■ 


5.  Scale  by  s  so  that 

=  m. 

The  constraints  on  the  process  are  that  should  project  along  n  with  scaled  length 
sN  =  n,  and  M'"*  should  project  along  m  with  scaled  length  sM  =  m. 

Now  suppose  that  we  don’t  know  the  image  points  exactly,  but  only  to  within  a  disc  of 
radius  e.  We  want  to  know  the  effect  of  this  on  the  computed  transformation,  i.e.  what  is 
the  range  of  uncertainty  in  each  of  the  transformation  parameters  if  each  of  the  image  points 
is  allowed  to  vary  over  an  e— disc?  We  divide  this  analysis  as  follows.  First  we  consider  the 
transformation  that  aligns  the  model  origin  and  the  two  model  vectors  with  the  image  origin 
and  the  two  image  vectors.  The  translation  is  explicit  in  this  computation,  and  we  note  that 
its  error  is  simply  bounded  by  the  image  uncertainty,  e.  We  then  derive  expressions  for  the 
remaining  transformation  parameters,  0,  0,  <f>  and  s,  which  are  only  implicit  in  the  alignment 
of  the  model  vectors  with  the  image  vectors.  Given  these  expressions  we  are  then  able  to 
characterize  the  effects  of  sensor  uncertainty  on  these  parameters. 

3  Aligning  the  Basis  Vectors 

First  we  note  that  the  translation  which  brings  the  model  origin  into  correspondence  with  the 
image  origin,  as  specified  in  Step  1  of  the  method,  simply  has  an  uncertainty  of  e,  the  sensor 
uncertjunty. 

We  have  some  freedom  in  choosing  the  model  coordinate  system,  and  in  particular,  we 
choose  the  coordinate  frame  such  that  both  M  and  N  are  orthogonal  to  the  optic  axis  z.  In 
this  case,  given  the  measured  image  data,  the  angle  V*  is: 

=  ^M,m^  =  cos0 
X  m,z^  =  -  =  sin  V’- 
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(1) 

(2) 


Because  there  will  be  some  uncertainty  in  computing  due  to  the  uncertainty  t  in  the 
image  points,  the  first  rotation,  using  Rodrigues’  formula,  transforms  a  vector  into 

■P''  =  +  H)P'  +  (1  -  cos{V>  +  ^0))  z)z  +  sin(0  +  6ip)  {z  x  P')  .  (3) 

By  this  we  mean  that  V’  denotes  the  nominal  correct  rotation  (i.e.  the  rotation  that  correctly 
aligns  the  model  with  the  measured  image  data,  without  the  uncertainty  bounds)  and  Si) 
denotes  the  deviation  in  angle  that  could  result  from  the  c-bounded  uncertainty  in  measuring 
the  position  of  the  image  point. 

If  we  use  the  small  angle  approximation  for  ^0,  by  assuming  |^0|  C  1,  then  we  have 


For  the  special  case  of  P'  =  M,  we  have 

M"  «  R-^M  +  Si;R-^^.M 

«  m  +  Si)fh^ 


(4) 


(5) 


Note  that  the  right  hand  side  is  not  a  unit  vector,  but  to  a  first  order  approximation,  the 
expression  is  sufficient,  since  we  have  assumed  that  |^0|  <  1.  Also  note  that  this  assumption  is 
reasonable,  so  long  as  we  place  lower  limits  on  the  length  of  an  acceptable  image  basis  vector, 
i.e.  we  ensure  that  the  length  of  the  vector  separating  two  basis  points  in  the  image  is  much 
greater  than  e. 

The  second  rotation  has  two  components  of  uncertainty: 


-  Pm+Stl>m^,$+S9P"-  (6) 

We  could  expand  this  out  using  Rodrigues’  formula,  and  keep  only  first  order  terms  in  6i>  and 
69,  under  a  small  angle  assumption.  Unfortunately,  we  have  found  experimentally  that  while 
we  can  safely  assume  that  Si)  is  small,  we  cannot  safely  make  the  same  assumptions  about  S(t> 
or  69.  Intuitively  this  makes  sense,  because  the  remaining  two  rotations  0  and  9  are  the  slant 
and  tilt  of  the  model  with  respect  to  the  image,  and  small  changes  in  the  image  may  cause 
large  changes  in  these  rotations.  Thus,  we  need  to  keep  the  full  trigonometric  expressions: 

«  Pm,e+se^'  +  ^0(1  -  cos(^  +  69))  [(P",  fh^)  m  +  (p",  m)  m-"] 

+^0sin(d  +  69)  ^m-*-  x  P")  (7) 

By  a  similar  reasoning,  the  third  rotation  gives: 


—  p  P'" 


(8) 
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Now  suppose  we  decompose  a  point  P'  into  the  natural  coordinate  system: 

P*  =  am  +  /9m-*-  +  72. 

Then  this  point  is  transformed  to 

Thus,  to  see  the  representation  for  the  transformed  point,  we  simply  need  to  trace  through 
the  transformations  associated  with  each  of  the  basis  vectors. 

Using  equations  (4),  (7)  and  (8),  we  find  that  rotating  the  basis  vectors,  then  projecting 
the  result  into  the  image  plane  (i.e.  along  2)  yields  (keeping  only  first  order  terms  in  6x1)) 

Iljm""  =  [cos(<^  +  6<t))  {cos  xj)  -  Sxj)  sin  xj)  cos(0  +  +  sin  xj)  sin(d  +  69)  sin(0  +  64>)]  fh 

+  [^V’cos  V’ +  sin  V’cos(d  +  rf^)]m-*-.  (9) 

Iljm-*-""  =  [cos(^  +  6<^){-sin^  -  6x1)  cos  x{)  cos{9  +  69)}  +  cosxl)s\n{9  +  6^)sin(<^  +  6(f>)]ih 
+  [— ^V’sin  V’ +  cos  V’cos(d  +  (10) 

n-l""  =  [rfV’cos(0  +  ^(/>)sin(d  +  +  cos(^  +  ^d)sin(</>  +  6^)]  m  -  sin(d  +  6®)m'‘-.  (11) 

Thus  far  we  have  seen  how  to  align  the  basis  vectors  of  the  model  with  the  basis  vectors 
measured  in  the  image,  by  a  combination  of  two-dimensional  translation  and  three-dimensional 
rotation.  Before  we  can  analyze  the  effects  of  uncertainty  on  the  rotation  and  scale  parameters, 
we  need  to  derive  expressions  for  them. 

4  Computing  the  Implicit  Parameters 

Now  we  consider  how  to  compute  the  remaining  parameters  of  the  transformation,  and  char¬ 
acterize  the  effects  of  uncertainty  on  these  parameters.  There  are  some  special  cases  of  the 
transformation  that  are  of  particular  importance  to  us.  First,  consider  the  case  of  P'  =  M. 
We  have  chosen  the  model  coordinate  system  so  that  in  this  case 

a  =  Mcosxl)  I3  =  —Msmxl)  7  =  0 

and  thus 

sll jM""  =  sM  cos{4>  -I-  6<i>)fh  +  sM6xl)m^ ,  (12) 

where  M  is  the  length  of  the  vector  M. 
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If  we  first  consider  the  transformation  that  aligns  the  model  with  the  measured  image 
points  (not  accounting  for  the  uncertainty  in  the  image  measurements),  we  want  the  scaled 
result  to  align  properly,  i.e. 

This  simply  requires  that 

m  ^ 

s  =  —sec(p.  (13) 

M 

Note  that  we  only  want  positive  scale  factors,  so  we  can  assume  that  sec4>  >  0,  or  that 


TT  JT 

- <d>< 

2  ~  2 

The  error  vector  associated  with  the  transformed  M  is  then 

=  m  sec  (pStpifi^  -f-  m  [cos  —  tan  sin  —  1]  m. 


(14) 


We  can  use  this  result  to  put  constraints  on  the  range  of  feasible  transformations,  and  then 
on  the  range  of  feasible  positions  for  other  aligned  points. 

Since  there  is  an  c-disc  of  error  associated  with  the  two  endpoints  of  m,  there  will  be  a 
range  of  acceptable  projections.  One  way  to  constrain  this  range  is  to  note  that  the  magnitude 
of  the  error  vector  should  be  no  more  than  2c.  (This  is  because  the  translation  to  align  origins 
has  an  uncertainty  disc  of  e  and  the  actual  position  of  the  endpoint  also  has  an  uncertainty 
disc  of  €.)  This  then  implies  that 

<  -Ic^ 


or,  with  substitutions,  that 

[cos^<^  —  tan  sin  —  1]^  +  (1  +  tan^  <p)(6rp)^  <  4  • 


(15) 


We  could  also  derive  slightly  weaker  constraints,  by  requiring  that  the  components  of  the 
error  vectors  e  in  each  of  the  directions  m  and  m-*-  be  of  magnitude  at  most  2e.  In  this  case, 
we  are  effectively  using  a  bounding  box  rather  than  a  bounding  disc  of  uncertainty.  This  leads 
to  the  following  constraints: 


I^V’I  <  —  |cos<^| 
m 

26 

Icos^^  —  1  —  tan0sin^(^(  <  — . 

m 


(16) 

(17) 


We  need  to  do  the  same  thing  with  N  and  n.  As  noted  earlier,  we  can  choose  the  model 
coordinate  system  so  that  N  has  no  z  component.  This  means  we  can  represent  N  by 


N'  =  N  cos^Af  +  N  sin^M^ 
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where  ^  is  a  known  angle.  Similarly,  we  czui  represent  the  corresponding  image  vector  by 


n=:  n  cos  um  +  n  sin 


where  w  is  a  measurable  angle.  This  means  that  the  nominal  (i.e.  no  error)  projected  trans¬ 
formation  of  N  is: 


[cost  cos <p  -f  sin ^sm(p Sind]  m  -| - j—  cos»J  m  . 


cos<i>  M 

But  in  principle  this  is  also  equal  to 


cos  <i>  M 


(18) 


n  =  n  cos  aim  n  sin  aim 


and  by  equating  terms  we  have 


m  N  ,  ,  , 

— —  [cos  4  cos  0  -b  sin  4  sin  ^  sin  B\ 
n  M 

m  N  .  .  ^ 

— rr  [sin  £  cos  9\ 
nM 


cos  4>  cos  O) 


cos  sin  a;. 


(19) 

(20) 


These  two  equations  define  the  set  of  solutions  for  the  transformation  parameters  (p  and  0. 
(Note  that  the  set  of  solutions  for  tp  is  ©ven  by  equations  (1)  and  (2).)  There  are  several  ways 
of  solving  these  implicit  equations  in  the  variables  of  interest,  namely  <p  and  9.  One  way  is 
given  below.  First,  let 

-  nE. 

^  n  M 

We  can  rewrite  these  equations  as  explicit  solutions  for  9: 


cos  9  = 
sin0  = 


sin  a;  cos 
»?sin£ 

cos<f>{cosu  —  t/cos£) 
77sin£sind* 


(21) 


This  gives  us  a  solution  for  0  as  a  function  of  (p.  To  isolate  (p,  we  use  the  fact  that  sin^  0-|-cos^  9  — 
1,  and  this  leads  to  the  following  equation: 


sin^  a;  cos'*  <p  -  +  I  -  2t]  cosui  cos  £j  cos^  <P  +  sin^  £  =  0.  (22) 

This  is  a  quadratic  in  cos^  0,  as  a  function  of  known  quantities,  and  hence  the  two  solutions 
will  yield  a  total  of  up  to  four  solutions  for  cos<p.  But  since  we  want  s  >  0,  we  know  that 
cos  0  >  0,  and  so  at  most  two  of  these  solutions  are  acceptable: 


cos  0  =  M 


1  -  27cosa;cos£  +  ±  ^(1  -  2r}cosu  cos^  +  t/^)^  -  4>?2sin*a;sin^£ 

2sin^a; 


(23) 
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Note  that  this  gives  real  solutions  only  if 


cos  w  cos  ^  < 


27] 


(24) 


Also  note  that  we  need  cos4>  <  1  so  that  if  this  does  not  hold  in  the  equation,  we  can  exclude 
the  cases  as  being  impossible.  If  there  are  two  real  solutions  for  4>,  they  can  be  used  with 
equation  (21)  to  find  solutions  for  0.  Note  that  solutions  to  equation  (22)  will  be  stable  only 
when  sin  a;  is  not  small.  This  makes  sense,  since  the  unstable  cases  correspond  to  image  bases 
in  which  the  basis  vectors  are  nearly  (anti-)parallel. 

The  complete  version  of  transforming  N  is  given  by: 


sN  [cos^cos(</»  +  64>)  +  sin^  {sin(0  +  6$)sia{<j>  +  6<f>)  —  StJ) cos{0  +  60)  cos(<^  -f-  «/<^)}]  m 
+  sN  [cos  +  sin  ^cos(0  +  ifl)]  fh'*’.  (25) 

Similar  to  our  analysis  of  the  error  associated  with  the  transformed  version  of  M ,  we  can 
set  the  magnitude  of  the  difference  between  this  vector  and  the  nominal  vector  to  be  less  than 
2c,  or  we  can  take  the  weaker  constraints  of  requiring  that  the  components  of  the  error  vector 
in  two  orthogonal  directions  each  have  magnitude  at  most  2e.  One  natural  choice  of  directions 
is  n  and  n-^  but  a  more  convenient,  and  equally  feasible,  choice  is  m  and  m-*-. 

In  the  latter  case,  bounding  the  component  of  the  error  in  the  direction  of  m-*-  yields 

•^sec</)[5^cos^  +  sin^cos(5+ ^^)]  -  —  sinw  <  — .  (26) 

M  mm 

The  nominal  transformation 


To  summarize,  we  compute  the  nominal  transformation,  which  is  when  the  nominal  image 
points  are  correct,  as  follows: 


1.  Choose  the  coordinate  system  of  the  model  so  that  the  origin  lies  at  O,  and  so  that  M 
and  ^  both  lie  in  the  z  =  0  plane. 

2.  Translate  the  model  so  that  the  origins  align  (  IIj  denotes  projection  along  the  z  axis): 

p'  =  p  +  o-  n^o. 


3.  Rotate  the  model  by  angle  ip  about  z  so  that  lies  along  m.  ^  is  given  by: 


(n-M,m) 

’4 


X  m. 


=  cosip 

—  ^M,  =  sin  ip. 
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4.  Rotate  the  model  by  angle  9  about  the  newly  resulting  M,  which  in  this  case  will  be  fh. 

5.  Rotate  the  model  by  angle  4>  about  m-*-.  The  angles  <t>  and  6  are  found  by  solving 


COS(f>  = 

for  <^,  and  then  solving 


1  -  2r]cosijcos^  +  Tj^  ±  ^(1  +  fj*  -  2»7Cosu>cosO^  -  sin^wsin^  f 

2sin^a; 


COS0 

sind 


for  0,  where 


sin  u;  cos  ^ 
ijsinf 

cos  <^cosw  -  T)  cos 
17  sin  ^  sin^i 

m  N 
^=nM 


and  where  only  solutions  for  which  0  <  tos4>  <  1  are  kept. 

6.  Project  into  the  image,  and  scale  by 

A 

s  =  —  sec  <p. 

M 

5  Uncertainty  in  the  Implicit  Parameters 

Now  we  turn  to  the  problem  of  bounding  the  range  of  uncertainty  associated  with  the  rotation 
and  scale  parameters  of  the  transformation,  given  that  there  are  e-bounded  positional  errors 
in  the  measurement  of  the  image  points. 

To  bound  the  rotational  uncertainties,  we  will  start  with  equation  (15): 

[cos6<^  -  ta.n<t>s\n6<f>  -  1]^  +  (1  +  tan^  </>)(^^)^  <  4  ■ 

From  this,  a  straightforward  bound  on  the  uncertainty  in  V’  is 

2(  2c 

<  —  |cos<7^|  =  —  COS0. 
m  m 

To  solve  for  bounds  on  6<p,  we  use  the  following  analysis.  Given  a  value  for  Sij),  we  have  the 
implicit  equation 

cosi0  —  1  —  tan</>sin^(^  =  (27) 

where  fi  can  range  over: 

-  ^ -  sec^  <^(^V’)^  ^  ft  <  ]J -  sec^  (f>(Stp)^. 
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We  can  turn  this  into  a  quadratic  equation  in  cos  and  into  a  quadratic  equation  in 
sin^^,  leading  to  two  solutions  for  each  variable,  and  hence  a  total  of  four  possible  solutions. 
By  enforcing  agreement  with  the  implicit  equation,  we  can  reduce  this  to  a  pair  of  solutions: 

sin^^  =  -(1  +  fi)  sin  <l> cos  <i>  +  acos(i>yJl  -  (1  +  /i)*  cos*  <p  (28) 

cos6<i)  =  (1  +  /i)cos^<^  +  asin4>yj\  -  (1  +  fi)^cos‘^4>  (29) 

where  cr  =  ±1.  Note  that  in  order  for  these  equations  to  yield  a  real  result  for  6<j>,  we  need  the 
argument  to  the  square  root  to  be  positive.  This  yields  additional  constraints  on  the  range  of 
legitimate  values  for  /i. 

These  two  equations  implicitly  define  a  range  of  solutions  for  6<i>,  as  fi  varies.  Since  fi  -I 
and  <j>  ±  j  (since  this  case  corresponds  to  m  =  0),  we  can  actually  simplify  this  to: 


tan  S4>  = 


—  tan  <f>  +  (Ty 
1  +  <T  tan 


1 

coe^ 


By  substituting  for  the  two  values  for  a  and  by  substituting  for  the  limits  on  fi,  we  can 
obtain  a  range  of  values  for  In  fact,  we  get  two  ranges,  one  for  the  case  of  it  =  1  and  one 
for  the  case  of  <t  =  —  1.  Only  one  of  these  two  ranges,  in  general,  wiU  contain  0,  and  since 
this  must  hold,  we  can  exclude  the  second  range.  In  fact,  when  a*  =  0,  tan  S<j!>  =  0,  and  if  we 
substitute  these  values  into  equation  (30),  we  find  that 


a  =  sgn(tan  <l>) 


1-1  if 


tan  >  0; 
tan  <^  <  0. 


Note  that  we  can  simplify  the  implicit  expressions  for  64>.  If  we  let 

1/  =  arccos  [(1  +  /i)  cos  <l>] 


6<p  =  (TV  -  <!>.  (31) 

To  solve  for  bounds  on  60,  we  do  a  similar  analysis,  using  equation  (26).  We  have  the 
implicit  equation 

N  n 

—  sec  <p[6il;  cos  ^  +  sin4'cos(^  +  60)] - sinw  =  u  (32) 

M  m 


where 


2(  2c 

- <  u  <  — . 

m  ~  m 


We  can  write  this  equation  in  the  form: 


cos  0  cos  60  —  s\n0sm  60  =  a 
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and  using  exactly  the  same  kind  of  analysis,  we  can  solve  for  sin  60  and  cos  60-. 


(33) 

(34) 

where 


cos  60  =  a  cos  0  +  a  sin  0\/ 1  — 
sin^0  =  — asind  +  acosO\/l  — 


<7 

= 

±1 

nM  sin  u; 

CL 

mN  sin  ^ 

2f, 

2f. 

- cos  <6 

<  6il^  < 

—  cos^l 

m 

m 

2f 

2c 

m 

<  < 

m’ 

6rl> 

tan^ 


.  M  cos  d> 

+  XT  ■  c  ^ 

N  sin^ 


and  where  fj,  is  further  constrained  to  keep  the  solutions  real.  Again,  by  substituting  for  the 
two  values  for  <7,  by  substituting  for  the  limits  on  /i,  and  by  substituting  for  the  limits  on  Oxp, 
we  can  obtain  a  range  of  values  for  60. 

Similar  to  the  6<f>  case,  we  actually  get  two  ranges,  one  for  the  case  of  <t  =  1  and  one  for 
the  case  of  =  -1.  Again,  in  general  only  one  of  these  two  ranges  will  span  0,  and  since  this 
must  hold,  we  can  automatically  exclude  the  second  range. 

Also  similar  to  the  6<(>  case,  we  can  simplify  the  expression  for  60.  In  particular,  if  we  let 


1/  =  arccos  a 


then 


60  =  bv-  0. 


(35) 


To  bound  the  error  in  scale,  we  return  to  equation  (12).  If  we  let  6s  denote  the  multiplicative 
uncertainty  in  computing  the  scale  factor  s,  i.e.  the  actual  computation  of  scale  is  s6s,  then 
by  equation  (12),  one  inequality  governing  this  uncertainty  is 

|2 


m6scos{<l>  +  6<^>) 

‘m6s6rj;' 

m  + 

cos  </) 

cos<f>  . 

<  4e'‘ 


(36) 


We  can  expand  this  out  and  solve  for  limits  on  6s,  which  axe  given  by 
6s  > 


cos(</>  +  6</>)  cos  4>  —  cos  (byj  ^  (cos*(<^  +  6<i>)  +  ^2^)  —  6^‘tp 
cos^(^  +  6<t>)  +  6'^  ip 
^  cos{<f>  +  6<f>)  cos  <f>  +  cos  (f>^  ^  (cos2(</»  +  6(f>)  +  6^x1))  —  6'^if; 

~  cos^(<^  +  6<(>)  + 


(37) 


Thus,  given  bounds  on  6ij},  from  which  we  can  get  a  range  of  values  for  6(f>,  we  can  compute 
the  range  of  values  for  6s. 
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In  sum,  our  task  was  to  obtain  reasonable  bounds  on  the  errors  in  the  six  parameters  of 
the  transformation,  namely  translation,  and  6s.  The  translation  is  known  up  to  an 

€-disc.  For  the  rotations  and  scale,  we  used  constraints  that  force  M  and  N  to  project  near 
the  uncertainty  regions  surrounding  iti  and  n,  respectively.  Specifically,  let  Em  be  the  error 
vector  from  tii  to  the  transformed  and  projected  M,  and  similarly  for  En  and  n.  We  first  used 
a  constraint  on  the  magnitude  of  Sm  the  direction  of  m-*-  to  get  bounds  on  6x1).  Then,  using 
the  bounds  on  6x1)  plus  a  more  general  constraint  on  the  magnitude  of  Em,  we  obtained  a  range 
of  values  for  64>.  Next,  we  used  the  bounds  on  6x{)  again  plus  a  constraint  on  the  magnitude 
of  En  in  the  direction  of  m-^  to  get  a  range  of  values  for  60.  Lastly,  we  bounded  6s  using  the 
bounds  on  6x1),  69,  and  the  general  constraint  on  the  magnitude  of  Em. 

Summary  of  Bounds  on  Parameters 

To  summarize,  we  have  the  foUowing  bounds  on  uncertainty  in  the  transformation  given  e 
uncertainty  in  the  image  measurements.  The  translation  uncertainty  is  simply  bounded  by  e. 
The  rotational  uncertainty  is: 


<  ^|cos<^|, 
m 


and 


sin^d>  =  -(1  + /i)sind'cos<^  + sgn(tan<^)cosd>yl  — (l+To^cos^ 
cos  6<f>  =  (1  +  fi)  cos^  4>  +  sgn(tan  ^)sind>^l  —  (1  +  fi)^  cos^  <j> 

subject  to  the  constraint  that: 

'^(6xl)f  <fi<  ^(~)  -  sec2<^(6V’)^, 


and 


where 


'2€V  2 

-1/1  — )  -sec2. 
.  TH  / 


COS  60  =  a  cos  0  +  a  sin  0\/l  — 
sin^^  =  -asin0  +  <tcos^\/1  — 


a 

±1 

nM  sin  u) 

o 

mN  sin^ 

2c. 

2e, 

- COS0 

—  cosd» 

m 

m 

2€ 

2e 

m 

</^< 

m 

6xj)  M  cos  6 

cosip  -  - - -  +  ■ . .  ,  M 

tan  4  A'  sin  4 


(38) 


(39) 


(40) 
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and  where  /x  is  further  constrained  to  keep  the  solutions  real. 

The  uncertainty  in  scale  is  constrained  by 

cos(^  +  6<p)  cos  <t>  —  cos  4>\J ^  (cos*(^  +  64>)  + 
~  cos*(^  +  64>)  +  PrJ) 

cos(^  +  64>)  cos  4>  +  cos  4>'^  ^  (cos*(^  + 


6s 


6s  < 


(41) 


cos^(4>  +  6((»)  +  6^ip 

We  note  that  the  bounds  on  the  error  for  69  and  6s  are  overly  conservative,  in  that  we 
have  not  used  all  the  constraints  available  to  us  in  bounding  these  errors. 


Constraints  on  the  Analysis 

All  of  the  bounds  computed  are  overestimates,  up  to  a  few  reasonable  approximations  and 
with  the  possible  exception  of  the  scale  factor.  To  bring  everything  together,  we  now  list  and 
discuss  the  approximations  and  overestimates. 

In  computing  the  formula  for  transforming  and  projecting  a  model  point  into  the  image, 
we  assumed  that  l^^’l  ■C  1,  so  that  we  could  use  the  small  angle  approximation,  which  gave 
cos  6il>  fa  1  and  sin  6ij)  »  6if)^  and  so  that  we  could  drop  higher  order  terms  in  6x1). 

Next,  we  list  the  sources  of  our  overbounds  on  the  parameters.  First,  we  used  a  constraint 
that  the  error  vector  (Sm)  for  projection  of  M  has  magnitude  at  most  2e.  This  is  a  weaker 
constraint  than  requiring  the  destination  point  of  the  transformed  and  projected  A?  to  be 
within  the  c-circle  surrounding  the  image  point  at  the  destination  point  of  ni. 

The  weak  constraint  on  Em  was  used  directly  to  bound  both  6<f>  and  6s,  but  an  even  weaker 
version  was  used  to  bound  6xl>.  The  weaker  version  simply  requires  the  magnitude  of  Em  in  the 
direction  of  to  be  at  most  2e.  Similarly,  69  was  bounded  with  a  constraint  that  forces  the 
magnitude  of  in  the  direction  of  m-*-  to  be  at  most  2f.  One  indication  that  the  constraint 
on  En  is  weak  is  that  it  is  independent  of  6<l>,  the  rotation  about  fh^ . 

Further,  it  should  be  observed  that  anothet  source  of  overbounding  was  the  treatment  of 
the  constraints  on  Sm  and  as  independent.  In  actuality  the  constraints  are  coupled. 

Finally,  there  is  one  place  where  we  did  not  clearly  overbound  the  rotation  errors,  which 
are  6x1),  69,  and  6<l).  In  computing  their  ranges  of  values,  we  used  the  nominal  value  of  the  scale 
factor,  whose  differences  from  the  extreme  values  of  the  scale  factor  may  not  be  insignificant. 


6  Using  the  Bounds 

The  bounds  on  the  uncertainty  in  the  3D  pose  of  an  object,  computed  from  three  corresponding 
model  and  image  points,  have  a  number  of  applications.  They  can  be  used  to  design  careful 
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verification  algorithms,  they  can  be  used  to  design  error-sensitive  voting  schemes,  and  they 
can  be  used  to  analyze  the  expected  performance  of  recognition  methods.  In  this  section,  we 
consider  all  three  such  applications. 

6.1  3D  Hough  transforms 

We  begin  by  considering  the  impact  of  the  error  analysis  on  pose  clustering  methods,  such  as 
the  generalized  Hough  transform[2].  These  methods  seek  to  find  solutions  to  the  recognition 
problem  by  the  following  general  technique: 

1.  Consider  a  pairing  of  A:-tuples  of  model  and  image  features,  where  k  is  the  smallest  such 
tuple  that  defines  a  complete  transformation  from  model  to  image. 

2.  For  each  such  pair,  determine  the  associated  transformation. 

3.  Use  the  parameters  of  the  transformation  to  index  into  a  hash  space,  and  at  the  indexed 
point,  increment  a  counter.  This  implies  that  the  pairing  of  fc-tuples  is  voting  for  the 
transformation  associated  with  that  index. 

Repeat  for  all  possible  pairings  of  A;-tuples. 

Search  the  hash  space  for  peaks  in  the  stored  votes,  such  peaks  serving  to  hypothesize  a 
pose  of  the  object. 

While  the  generalized  Hough  transform  is  usually  used  for  matching  2D  images  to  2D 
objects  undergoing  rigid  transformations  in  the  plane,  or  for  matching  3D  objects  to  3D 
data,  it  has  also  been  applied  to  recognizing  3D  objects  from  2D  images  (e.g.  [24,  25,  15]). 
In  this  case,  each  dimension  of  the  Hough  space  corresponds  to  one  of  the  transformation 
parameters.  Under  a  weak  perspective  imaging  model  these  parameters  (as  we  saw  above) 
are  two  translations,  three  rotations,  and  a  scale  factor.  The  method  summarized  at  the  end 
of  Section  4  provides  one  technique  for  determining  the  values  of  these  parameters  associated 
with  a  given  triple  of  model  and  image  points. 

In  the  case  of  perfect  sensor  data,  the  generalized  Hough  method  generally  results  in 
correctly  identified  instances  of  an  object  model.  With  uncertainty  in  the  data,  however,  in 
steps  (2)  and  (3)  one  really  needs  to  vote  not  just  for  the  nominal  transformation,  but  for  the 
full  range  of  transformations  consistent  with  the  pairing  of  a  given  fc-tuple  (in  this  case  triple) 
of  model  and  image  points.  Our  analysis  provides  a  method  for  computing  the  range  of  values 
in  the  transformation  space  into  which  a  vote  should  be  cast: 


4. 

5. 


16 


1.  Consider  a  pairing  3- tuples  of  model  and  image  features. 

2.  For  each  such  pair,  determine  the  associated  transformation,  using  the  method  of  Section 
4  to  determine  the  nominal  values  of  the  transformation,  and  using  equations  (38)  (39) 
(40)  and  (41)  to  determine  the  variation  in  each  parameter. 

3.  Use  the  parameters  of  the  transformation,  together  with  the  range  of  variation  in  each 
parameter,  to  index  into  a  hash  space,  and  at  the  indexed  point,  increment  a  counter. 

4.  Repeat  for  all  possible  pairings  of  3-tuples. 

5.  Search  the  hash  space  for  peaks  in  the  stored  votes,  such  peaks  serving  to  hypothesize  a 
pose  of  the  object. 

6.2  Effects  of  Error  Sensitivity  on  the  Hough  Transform 

Unfortunately,  allowing  for  uncertainty  in  the  data  dramatically  increases  the  chances  of  false 
peaks  in  the  vote  in  hash  space,  because  each  tuple  of  model  and  image  points  votes  for  a 
(possibly  large)  range  of  transformations. 

Earlier  analysis  of  this  effect  [12]  has  shown  that  the  sensitivity  of  the  Hough  transform  as 
a  tool  for  object  recognition  depends  critically  on  its  redundancy  factor,  defined  as  the  fraction 
of  the  pose  space  into  which  a  single  data-model  pairing  casts  a  vote.  This  previous  analysis 
was  done  for  2D  objects  and  2D  images,  and  for  3D  objects  and  3D  images.  Here  we  examine 
the  impact  of  this  effect  on  using  the  Hough  transform  for  recognizing  3D  objects  from  2D 
images.  We  use  the  analysis  of  the  previous  section  to  determine  the  average  fraction  of  the 
parameter  space  specified  by  a  triple  of  model  and  image  points,  given  c-bounded  uncertainty 
in  the  sensing  data.  (In  [12],  experimental  data  from  [25]  were  used  to  analyze  the  behavior, 
whereas  here  we  derive  analytic  values.) 

To  do  this,  we  simply  find  the  expected  range  of  values  for  as  given  by  equation  (16), 
for  6<f>,  as  given  by  equation  (30),  and  for  66,  as  given  by  equations  (33)  and  (34).  We  could 
do  this  by  actually  integrating  these  ranges  for  some  distribution  of  parameters.  An  easier 
way  of  getting  a  sense  of  the  method  is  to  empirically  sample  these  ranges.  We  have  done  this 
with  the  following  experiment.  We  created  a  set  of  model  features  at  random,  then  created 
sets  of  image  features  at  random.  We  then  selected  matching  triples  of  points  from  each  set, 
and  used  them  to  compute  a  transformation,  and  the  associated  error  bounds.  For  each  of  the 
rotational  parameters,  we  measured  the  average  range  of  variation  predicted  by  our  analysis. 
The  positional  uncertainty  in  the  sensory  data  was  set  to  be  (  =  1,3  or  5.  The  results  are 
summarized  in  Table  1,  where  we  report  both  the  average  range  of  uncertainty  in  angle  (in 
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Table  1:  Ranges  of  uncertainty  in  the  transformation  parameters.  Listed  are  the  average  range  of  uncertainty 
in  each  of  the  rotation  parameters  and  the  range  of  uncertainty  in  the  multiplicative  scale  factor.  The  ranges 
are  also  normalized  to  the  total  possible  range  for  each  parameter  (see  text  for  details).  Also  indicated  are  br, 
the  redundancy  in  rotation,  6t  the  redundancy  in  translation,  assuming  that  the  image  dimension  is  D  =  500, 
and  6,  the  overall  redundancy.  Tables  are  for  r  =  1,3  and  5  respectively. 


radians),  and  the  average  range  normalized  by  2ir  in  the  case  of  &  and  V*  &nd  by  tt  in  the  case 
of  ^  (since  it  is  restricted  to  the  range  — jr/2  <(f><  ir/2). 

The  product  of  the  three  rotation  terms,  which  we  term  6,,  defines  the  average  fraction  of 
the  rotational  portion  of  the  pose  space  that  is  consistent  with  a  pairing  of  mode)  and  image 
3-tuples. 

To  get  the  overall  redundancy  factor  (the  fraction  of  the  transformation  space  that  is 
consistent  with  a  given  pairing  of  model  and  sensor  points),  we  must  also  account  for  the 
translational  and  scale  parameters.  If  D  is  the  size  of  each  image  dimension,  then  the  fraction 
of  the  translational  portion  of  pose  space  consistent  with  a  given  pairing  of  three  model  and 
image  points  is 


In  the  examples  reported  in  Table  1,  we  used  a  value  oi  D  —  500,  where  the  largest  possible 
distance  between  model  features  in  the  image  was  176  pixels. 

To  estimate  the  range  of  uncertainty  in  scale,  we  use  the  following  method.  Since  the  scale 
factor  is  a  multiplicative  one,  we  use  log  a  as  the  ‘key’  to  index  into  the  hash  space,  so  that 
when  we  account  for  uncertainty  in  scale,  aSs  is  transformed  into  logs  -I-  log ^s.  If  we  assume 
that  Sroax  and  SnUn  denote  the  maximum  and  minimum  allowed  scale  factors,  then  the  fraction 
of  the  scale  dimension  covered  by  the  computed  uncertainty  range  is 


log^Smax  ~  log<^^min 
log 

■^max  -  log  '®nun 


In  the  case  of  the  experiments  described  in  Table  1,  we  used  Smax  =  -13  and  Smin  =  -03. 
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_ €=1 _ 3 _ 5 

Hough  1.984e-*®  1.447c-‘2  a.lOle"" 

Estimate  1.961e-^*  4.710e-®  4.777e-* 

Table  2:  Comparing  fractions  of  the  pose  space  consistent  with  a  match  of  3  image  and  model  points.  The 
Hough  line  indicates  the  average  size  of  such  regions  for  different  amounts  of  sensor  uncertainty.  The  Estimate 
line  indicates  the  corresponding  sizes  using  the  uncertainty  bounds  on  the  transformation  parameters  derived 
in  the  previous  section. 

The  overall  redundancy  (fraction  of  the  transformation  space  consistent  with  a  given  triple 
of  model  and  image  points)  is 

6  =  brbtb,,  (42) 

Values  for  b  are  reported  in  Table  1. 

For  this  particular  example,  one  can  see  that  while  the  uncertainty  in  bxj)  is  quite  small, 
the  uncertainty  in  the  other  two  rotational  parameters  can  be  large,  especially  for  large  values 
of  e.  The  redundancy  in  translation  is  a  bit  misleading,  since  it  depends  on  the  relative  size  of 
the  object  to  the  image.  The  uncertainty  in  scale,  normalized  to  the  total  range  of  scale  can 
in  principle  be  quite  large,  though  this  also  depends  on  the  total  range  of  possible  values. 
Note  how  dramatically  the  redundancy  jumps  in  going  from  c  =  1  to  f  =  3. 

Evaluating  the  analysis 

We  are  overestimating  the  region  of  possible  transformations,  and  one  obvious  question  is 
how  bad  is  this  overestimate.  We  can  explore  this  by  the  following  alternative  analysis.  We 
are  basically  considering  the  following  question:  Given  3  model  points  and  3  matching  image 
points,  what  is  the  fraction  of  the  space  of  possible  transformations  that  is  consistent  with 
this  match,  given  e  uncertainty  in  the  sensed  data?  This  is  the  same  as  asking  the  following 
question:  For  any  pose,  what  is  the  probability  that  that  pose  applied  to  a  set  of  3  model  points 
will  bring  them  into  agreement,  modulo  sensing  uncertainty,  with  a  set  of  3  image  points?  If 
D  is  the  linear  dimension  of  the  image  (or  the  fraction  of  the  image  being  considered),  then 
this  probability,  under  a  uniform  distribution  assumption  on  image  points,  is 


because  the  probability  of  any  of  the  transformed  points  matching  an  image  point  is  just  the 
probability  that  it  falls  within  the  (  error  disc,  and  by  uniformity  this  is  just  the  ratio  of  the 
area  of  that  disc  to  the  area  of  the  image  region. 
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The  expression  in  equation  (43)  defines  the  best  that  we  could  do,  if  we  were  able  to 
exactly  identify  the  portion  of  pose  space  that  is  consistent  with  a  match.  To  see  how  badly 
we  are  overestimating  this  region,  we  compare  the  results  of  Table  1  with  those  predicted  by 
this  model,  as  shown  in  Table  2.  One  can  see  from  this  table  that  our  approximate  method 
overestimates  by  about  a  factor  of  1000.  Considering  this  is  distributed  over  a  6  dimensional 
space,  this  implies  we  are  overestimating  each  parameter’s  range  by  about  a  factor  of  3.  These 
numbers  are  potentially  misleading  since  they  depend  on  the  relative  size  of  the  object  to  the 
image.  Nonetheless,  they  give  an  informal  sense  of  the  difference  between  the  ideal  Hough  case 
and  the  estimates  obtained  by  this  method. 


Using  the  analysis 


Although  these  values  may  seem  like  an  extremely  small  fraction  of  the  6  dimensional  space 
that  is  filled  by  any  one  vote  for  a  pairing  of  3-tuples,  it  is  important  to  remember  that  under 
the  Hough  scheme,  all  possible  pairings  of  3-tuples  cast  votes  into  the  space,  and  there  are 
on  the  order  of  such  tuples,  where  m  is  the  number  of  known  model  features,  and  s  is 
the  number  of  measured  sensory  features.  By  this  analysis,  each  such  tuple  will  actually  vote 
for  a  fraction  b  of  the  overall  hash  space.  Even  in  the  ideal  limiting  case  of  infinitesimal  sized 
buckets  in  the  transformation  space,  there  is  likely  to  be  a  significant  probability  of  a  false 
peak. 

To  see  this,  we  can  apply  the  analysis  of  [12].  In  particular,  the  probability  that  a  point  in 
the  pose  space  will  receive  a  vote  of  size  j  can  be  approximated  by  the  geometric  distribution. 


where  A  =  m^s^b.  The  probability  of  at  least  I  votes  at  a  point  is  then 


(44) 


(45) 


That  is,  this  expression  denotes  the  fraction  of  the  cells  in  pose  space  that  will  have  votes 
at  least  as  large  as  £.  In  most  recognition  systems,  it  is  common  to  set  a  threshold,  t,  on 
the  minimum  size  correspondence  that  will  be  accepted  as  a  correct  solution.  Thus  we  are 
concerned  with  the  probability  of  a  false  positive,  i.e.  a  set  of  at  least  t  feature  pairings 
accidentally  masquerading  as  a  correct  solution.  Suppose  we  set  this  threshold  by  assuming 
that  a  correct  solution  will  have  pairings  of  image  features  for  t  =  fm  of  the  model  features, 
where  /  is  some  fraction,  0  <  /  <  1.  Since  we  are  using  triples  to  define  entries  into  the  hash 
table,  there  wiU  be  (3)  w  votes  cast  at  any  point  that  is  consistent  with  a  transformation 
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/=  .25 

€  =  1 

.5 

.75 

t 

/  =  .25 

=  3 

.5 

.75 

.75 

1 

O 

It 

557 

1114 

1672 

90 

179 

269 

41 

83 

124 

10-^ 

487 

974 

1460 

78 

157 

235 

36 

72 

109 

10-“ 

442 

885 

1327 

71 

142 

213 

33 

66 

99 

Table  3: 


Approximate  limits  on  the  number  of  sensory  features,  such  that  the  probability  of  a  false  positive  of  size  /m 
is  less  than  6,  shown  for  different  fractions  /,  and  different  thresholds  6.  The  tables  are  for  errors  of  e  =  1,3 
and  5  respectively. 


aligning  x  of  the  model  features  with  image  features.  Thus,  if  we  want  the  probability  of  a 
false  positive  accounting  for  /m  of  the  model  features  to  be  less  than  some  bound  we  need 


(46) 


Substituting  for  A  and  rearranging  the  equation  leads  to  the  following  bound  on  the  number 
of  sensory  features  that  can  be  tolerated  under  these  conditions; 


m 


(47) 


Following  the  analysis  of  [12]  we  can  use  the  series  expansion 


j=0  J- 


together  with  a  Taylor  series  expansion,  to  approximate  this  bound  on  s  by: 


^lim 


6/^m^ 


(48) 


Note  that  to  a  first  order  approximation,  the  limit  on  the  number  of  sensory  features  that  can 
be  tolerated,  while  keeping  the  probability  of  a  false  positive  of  size  fm  below  some  threshold 
6  is  independent  of  the  number  of  model  features  m,  and  only  depends  on  the  redundancy  b 
(and  hence  the  uncertainty  e),  the  fraction  /  of  the  model  to  be  matched,  and  the  threshold  6. 
To  get  a  sense  of  the  range  of  values  for  a,  we  chart  in  Table  3  the  limiting  values  for  s  based 
on  equation  (48)  and  using  values  for  b  from  Table  1. 

One  can  see  from  this  that,  except  in  the  case  of  very  small  sensor  uncertainty,  the  Hough 
space  very  rapidly  saturates,  largely  because  of  the  number  of  cases  that  cast  votes  into 
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the  space.  As  a  consequence,  the  3D-from-2D  Hough  transform  will  perform  well  only  if  some 
other  process  pre-selects  a  subset  of  the  sensor  features  for  consideration.  Note  that  these 
numbers  are  based  on  the  redundancy  values  associated  with  our  derived  approximation  for 
the  volume  of  Hough  space  associated  with  each  pairing  of  model  and  image  triples  of  features. 
As  we  saw,  in  the  ideal  case,  the  actual  volume  of  Hough  space  is  smaller,  by  a  factor  of  about 
1000  (Table  2).  This  means  that  the  values  for  the  limits  on  sensor  clutter  in  Table  3  in  the 
ideal  case  will  be  larger  by  roughly  a  factor  of  10.  (Of  course,  this  also  requires  that  one 
can  find  an  efficient  way  of  exactly  determining  the  volume  of  Hough  space  consistent  with  a 
pairing  of  image  and  model  features.)  For  the  larger  uncertainty  values,  this  still  leaves  fairly 
tight  limits  on  the  amount  of  sensory  data  that  can  be  accomodated.  This  analysis  supports 
our  earlier  work  [12],  in  which  we  showed  using  empirical  data,  that  the  Hough  transform 
works  well  only  when  there  is  limited  noise  and  scene  clutter. 

6.3  3D  Alignment 

Alignment  methods  differ  from  pose  clustering,  or  generalized  Hough,  methods  in  that  the 
computed  transformation  is  directly  applied  to  the  model,  and  used  to  check  for  additional 
corresponding  model  and  image  features.  In  order  to  analyze  the  effects  of  sensory  uncertainty 
on  this  type  of  recognition  method,  we  need  to  know  what  happens  when  a  model  point  is 
transformed  and  projected  into  the  image.  That  is,  what  is  the  range  of  positions,  about  the 
nominal  correct  position,  to  which  a  transformed  model  point  can  be  mapped?  Determining 
this  allows  us  to  design  careful  verification  algorithms,  in  which  minimal  regions  in  the  image 
are  examined  for  supporting  evidence. 

In  this  section,  we  describe  how  to  use  our  analysis  to  bound  the  range  of  possible  positions 
of  a  given  model  point  that  is  projected  into  the  image.  In  addition,  we  illustrate  how  the 
bounding  regions  we  compute  compare  to  the  true  regions  of  uncertainty.  Lastly,  we  look  at 
the  implications  of  using  our  bounds  to  perform  alignment- based  recognition.  In  particular, 
we  compute  the  probability  of  a  false  positive  match,  which  is  when  model  points  transformed 
under  an  incorrect  transform  are  aligned  to  random  image  points  up  to  error. 

A  Simple  Verification  Algorithm 

In  the  Alignment  Method,  pairs  of  3  model  and  image  points  are  used  to  hypothesize  the  pose 
of  an  object  in  the  image.  In  other  words,  a  method  such  as  the  one  described  in  section 
4  is  used  to  compute  the  transformation  associated  with  this  correspondence,  and  then  that 
transformation  is  applied  to  all  model  features,  thereby  mapping  them  into  the  image.  To 
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verify  such  an  hypothesis,  we  need  to  determine  which  of  the  aligned  features  have  a  match  in 
the  image.  While  this  should  really  be  done  on  edge  features,  here  we  describe  a  method  for 
verification  from  point  features  (the  edge  case  is  the  subject  of  forthcoming  work).  The  key  is 
to  approximate  the  smallest  sized  region  in  which  to  search  for  a  match  for  each  image,  such 
that  the  correct  match,  if  it  exists,  will  not  be  missed.  We  can  approximate  such  regions  as 
follows: 

1.  Decompose  each  model  point  P  into  the  natural  coordinate  system,  so  that  it  transforms 
as: 

am  +  +  7?  sH-  [am""  +  +  72""]  . 

The  transformation  of  the  basis  vectors  is  given  by  equations  (9),  (10)  and  (11).  This 
allows  us  to  determine  the  position  of  the  nominally  transformed  model  point. 

2.  Select  sample  values  for  Sip,  at  some  spacing,  subject  to  the  conditions  of  equation  (38). 

3.  For  each  value,  use  equations  (39),  (40)  and  (41)  to  compute  bounds  on  the  variation 
in  the  other  error  parameters.  This  leads  to  a  set  of  extremal  variations  on  each  of  the 
parameters. 

4.  For  each  collection  of  error  values  in  this  set,  perturb  the  nominal  transformation  param¬ 
eters,  and  compute  a  new  position  for  the  transformed  model  point.  Take  the  difference 
from  the  nominal  point  to  determine  an  error  offset  vector. 

5.  Expand  each  error  offset  vector  outward  from  the  nominal  point  by  an  additional  offset 
of  2€  to  account  for  the  translational  uncertainty  and  the  inherent  uncertainty  in  sensing 
the  point. 

6.  Add  each  error  vector  to  the  nominal  point  in  the  image,  and  take  the  convex  hull  of  the 
result  to  get  a  good  estimate  of  the  range  of  feasible  positions  associated  with  a  projected 
model  point. 

7.  Search  over  this  region  for  a  matching  image  point. 

8.  If  sufficiently  many  projected  model  points  have  a  match,  accept  the  hypothesized  pose. 

An  example  of  this  is  shown  in  Figure  2.  The  figure  was  created  by  taking  a  random  set 
of  3D  points  as  a  model,  arbitrarily  rotating  and  translating  them,  projecting  them  into  the 
image  and  scaling  with  a  random  scale  factor  between  .05  and  .1,  and  perturbing  the  result 
randomly  with  error  vectors  of  magnitude  at  most  e,  resulting  in  a  set  of  corresponding  data 
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Figure  2: 

Two  examples  of  uncertainty  regions,  with  perturbation  of  the  data. 

points.  In  this  figure,  the  open  circles  represent  a  set  of  image  points,  each  displayed  as  an 
e-disc.  The  process  is  to  match  3  model  points  to  3  data  points  whose  positions  are  known 
up  to  e-circles.  Then,  the  match  is  used  together  with  the  parameters  of  the  transformation 
to  compute  the  uncertainty  regions  (displayed  as  polygons)  and  the  crosses,  which  lie  at  the 
nominal  location  of  the  model.  Note  that  the  image  points  corresponding  to  the  model  points 
could  fall  anywhere  within  the  polygons,  so  that  simply  searching  an  t-circle  for  a  match  would 
not  be  sufficient. 

One  can  see  that  the  uncertainty  regions  vary  considerably  in  size.  To  get  a  sense  of  this 
variation,  we  ran  a  series  of  trials  as  above,  and  collected  statistics  on  the  areas  associated 
with  each  uncertainty  region,  over  a  large  number  of  different  trials  of  model  and  image  points. 
We  can  histogram  the  areas  of  the  observed  discs,  in  terms  of  7r(2e)^  (the  size  of  the  basic  disc 
of  uncertainty).  A  sample  histogram,  normalized  to  sum  to  1,  is  shown  in  Figure  3,  and  was 
based  on  10000  different  predicted  regions  of  uncertainty.  For  the  case  of  e  =  5,  the  expected 
area  of  an  uncertainty  region  is  2165  square  pixels.  For  the  case  of  e  =  3,  the  expected  area  of 
an  uncertainty  region  is  1028  square  pixels.  For  e  =  1,  the  expected  area  is  195  square  pixels. 
In  all  cases,  the  maximum  separation  of  image  features  was  mmax  =  176. 

Using  the  analysis 

One  advantage  of  knowing  the  bounds  on  uncertainty  in  computing  a  transform  is  that  they 
can  be  used  to  overestimate  the  regions  of  the  image  into  which  aligned  features  project.  This 
gives  us  a  way  of  designing  careful  verification  systems,  in  which  we  are  guaranteed  to  find  a 
correct  corresponding  image  feature,  if  it  exists,  while  at  the  same  time  keeping  the  space  over 
which  to  search  for  such  features  (and  thus  the  number  of  false  matches)  relatively  small.  Of 
course,  we  know  that  our  estimates  err  on  the  high  side,  and  it  is  useful  to  see  how  much  our 
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Figure  3: 

Graph  of  the  distribution  of  areas  of  uncertainty  regions,  as  measured  in  the  image,  for  the  case  of  r  =  3.  The 
horizontal  axis  is  in  units  of  ir(2c)^.  The  vertical  axis  records  the  fraction  of  the  distribution  of  uncertainty 
regions  with  that  size. 

projected  image  regions  overestimate  the  range  of  possible  positions  for  matches 

To  do  this,  we  have  run  the  following  experiment.  We  took  a  3D  model  of  a  telephone, 
and  created  an  image  of  that  model  under  some  arbitrary  viewing  condition.  We  then  chose  a 
corresponding  triple  of  image  and  model  features  and  used  the  method  described  here  both  to 
determine  the  alignment  transformation  of  the  model  and  to  determine  our  estimates  of  the 
associated  uncertainty  regions  for  each  feature,  based  on  assuming  e-discs  of  uncertainty  in  the 
image  features.  For  comparison,  we  took  a  sampling  of  points  on  the  boundary  of  the  e-disc 
around  each  of  the  basis  images  points,  computed  the  associated  alignment  transformation, 
and  projected  each  additional  model  features  into  the  image.  We  collected  the  set  of  positions 
for  each  projected  model  point  as  we  allowed  the  basis  points  to  vary  over  their  f-discs,  and 
used  this  to  create  regions  of  uncertainty  about  each  aligned  point.  This  should  be  a  very  close 
approximation  to  the  actual  region  of  uncertainty.  We  compare  these  regions  to  our  estimated 
regions  in  Figure  4.  One  can  see  that  our  method  does  overestimate  the  uncertainty  regions, 
although  not  drastically. 

Finally,  we  can  use  our  estimates  of  the  uncertainty  regions  to  estimate  the  probability 
that  a  random  pairing  of  model  and  image  bases  will  collect  votes  from  other  model  points. 
That  is,  if  we  use  a  random  alignment  of  the  model  and  project  the  remaining  transformed 
model  points  into  the  image,  on  average  each  such  point  will  define  an  uncertainty  region  of 
the  size  computed  above.  If  we  consider  an  image  of  dimension  D  =  500,  then  the  selectivity 
of  the  method  (i.e.  the  probability  that  each  model  point  will  find  a  potentially  matching 
image  point  in  its  uncertainty  region)  is  0.000781,  0.00411  and  0.00866  for  c  =  1,3  and  5 
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Figure  4: 

Comparison  of  ideal  uncertainty  regions  with  estimated  regions.  Each  feature  in  the  model  is  projected  according 
to  the  nominal  transformation,  as  illustrated  by  the  points.  The  dark  circles  show  the  ideal  regions  of  uncertainty. 
The  larger  enclosing  convex  regions  show  the  estimated  uncertainty  regions  computed  by  our  method.  Two 
different  solutions  are  shown. 

respectively.  By  comparison,  the  selectivity  for  the  case  of  a  planar  object  in  arbitrary  3D 
position  and  orientation,  for  the  same  level  of  sensor  uncertainty  is  0.000117,0.001052  and 
0.002911  respectively  (Table  1  of  [14]).  Although  they  represent  overestimates,  these  results 
suggest  that  the  selectivity  of  recognition  methods  applied  to  3D  objects  should  be  only  slightly 
worse  than  when  applied  to  2D  objects. 

To  see  this,  we  can  use  the  analysis  of  [14]  to  estimate  limits  on  the  number  of  image 
features  that  can  be  tolerated,  while  maintaining  a  low  false  positive  rate.  Recapping  from 
that  earlier  work,  the  false  positive  rate  is  computed  by  the  following  method: 

1.  The  selectivity  of  the  method  is  defined  by  the  probability  that  the  uncertainty  region 
associated  with  a  projected  model  point  contains  an  image  point,  and  this  is  just  the 
redundancy  (fraction  of  the  transformation  space  that  is  consistent  with  a  given  triple 
of  model  and  image  points)  b,  as  defined  in  equation  (  42). 

2.  Since  each  model  point  is  projected  into  the  image,  the  probability  that  a  given  model 
point  matches  at  least  one  image  point  is 

p  =  l-(l-6)*-=» 

because  the  probability  that  a  particular  model  point  is  not  consistent  with  a  particular 
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Table  4: 


Approximate  limits  on  the  number  of  sensory  features,  such  that  the  probability  of  a  false  positive  of  size  fk  is 
less  than  6,  shown  for  different  fractions  /,  and  different  thresholds  6.  The  tables  are  for  errors  of  r  =  1,3  and 
5  respectively,  and  for  m  =  200. 

image  point  is  (1  -  6)  and  by  independence,  the  probability  that  all  s  —  3  points  are  not 
consistent  with  this  model  point  is  (1  — 

3.  The  process  is  repeated  for  each  model  point,  so  the  probability  of  exactly  k  of  them 
having  a  match  is 

*  =  (49) 

Further,  the  probability  of  a  false  positive  identification  of  size  at  least  k  is 

fc-i 

=  1  -  XT  9‘  - 

«=o 

Note  that  this  is  the  probability  of  a  false  positive  for  a  particular  sensor  basis  and  a 
particular  model  basis. 

4.  This  process  can  be  repeated  for  all  choices  of  model  bases,  so  the  probability  of  a  false 
positive  identification  for  a  given  sensor  basis  with  respect  to  any  model  basis  is 

e)t  =  (50) 

Thus,  we  can  compute  limits  on  s  such  that  Ck  <  S  where  S  is  some  threshold  on  the  false 
positive  rate,  and  where  k  is  taken  to  be  fm  for  some  fraction  0  <  /  <  1.  In  Table  4,  we  list 
these  limits,  computed  using  equation  (50)  and  values  of  h  obtained  from  the  ratio  of  areas 
described  in  equation  (42). 

While  these  results  give  a  sense  of  the  limits  on  alignment,  they  are  potentially  slightly 
misleading.  What  they  say  is  that  if  we  used  the  derived  bounds  to  compute  the  uncertainty 
regions  in  which  to  search  for  possible  matches,  and  we  use  no  other  information  to  evaluate  a 
potential  match,  then  the  system  saturates  fjurly  quickly.  As  we  know  from  Figure  4,  however. 
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our  method  overestimates  the  uncertainty  regions,  and  a  more  correct  method,  such  as  that 
described  earlier  in  which  one  uses  sample  of  the  basis  uncertainty  regions  to  trace  out  ideal 
uncertainty  regions  for  other  points,  would  lead  to  much  smaller  uncertainty  regions,  smaller 
values  for  the  redundancy  b,  and  hence  a  more  forgiving  verification  system.  Also,  one  could 
clearly  augment  the  test  described  here  to  incorporate  additional  constraints  on  the  pose  and 
its  uncertainty  that  can  be  obtained  by  using  additional  matches  of  model  and  sensory  features 
to  further  limit  the  associated  uncertainty  (e.g.  given  4  pairs  of  matched  points,  use  sets  of  3 
to  compute  regions  of  uncertainty  in  pose  space,  intersect  these  regions  and  use  the  result  to 
determine  the  poses  uncertainty). 

7  Summary 

A  number  of  object  recognition  systems  compute  the  pose  of  a  3D  object  from  using  a  small 
number  of  corresponding  model  and  image  points.  When  there  is  uncertainty  in  the  sensor 
data,  this  can  cause  substantial  errors  in  the  computed  pose.  We  have  derived  expressions 
bounding  the  extent  of  uncertainty  in  the  pose,  given  e-bounded  uncertainty  in  the  mea¬ 
surement  of  image  points.  The  particular  pose  estimation  method  that  we  analyzed  is  that 
of  [12,  13],  which  determines  the  pose  from  3  corresponding  model  and  image  points  under  a 
weak  perspective  imaging  model.  Similar  analyses  hold  for  other  related  methods  of  estimating 
pose. 

We  then  applied  this  analysis  in  order  to  analyze  the  effectiveness  of  two  classes  of  recog¬ 
nition  methods  that  use  pose  estimates  computed  in  this  manner;  the  generalized  Hough 
transform  and  alignment.  We  found  that  in  both  cases,  the  methods  have  a  substantial  chance 
of  making  a  false  positive  identification  (claiming  an  object  is  present  when  it  is  not),  for  even 
moderate  levels  of  sensor  uncertainty  (a  few  pixels). 
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